{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "@import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');\n",
       "@import url('http://fonts.googleapis.com/css?family=Vollkorn');\n",
       "@import url('http://fonts.googleapis.com/css?family=Arimo');\n",
       "@import url('http://fonts.googleapis.com/css?family=Fira_sans');\n",
       "\n",
       "    div.cell{\n",
       "        width: 900px;\n",
       "        margin-left: 0% !important;\n",
       "        margin-right: auto;\n",
       "    }\n",
       "    div.text_cell code {\n",
       "        background: transparent;\n",
       "        color: #000000;\n",
       "        font-weight: 600;\n",
       "        font-size: 11pt;\n",
       "        font-style: bold;\n",
       "        font-family:  'Source Code Pro', Consolas, monocco, monospace;\n",
       "   }\n",
       "    h1 {\n",
       "        font-family: 'Open sans',verdana,arial,sans-serif;\n",
       "\t}\n",
       "\t\n",
       "    div.input_area {\n",
       "        background: #F6F6F9;\n",
       "        border: 1px solid #586e75;\n",
       "    }\n",
       "\n",
       "    .text_cell_render h1 {\n",
       "        font-weight: 200;\n",
       "        font-size: 30pt;\n",
       "        line-height: 100%;\n",
       "        color:#c76c0c;\n",
       "        margin-bottom: 0.5em;\n",
       "        margin-top: 1em;\n",
       "        display: block;\n",
       "        white-space: wrap;\n",
       "        text-align: left;\n",
       "    } \n",
       "    h2 {\n",
       "        font-family: 'Open sans',verdana,arial,sans-serif;\n",
       "        text-align: left;\n",
       "    }\n",
       "    .text_cell_render h2 {\n",
       "        font-weight: 200;\n",
       "        font-size: 16pt;\n",
       "        font-style: italic;\n",
       "        line-height: 100%;\n",
       "        color:#c76c0c;\n",
       "        margin-bottom: 0.5em;\n",
       "        margin-top: 1.5em;\n",
       "        display: block;\n",
       "        white-space: wrap;\n",
       "        text-align: left;\n",
       "    } \n",
       "    h3 {\n",
       "        font-family: 'Open sans',verdana,arial,sans-serif;\n",
       "    }\n",
       "    .text_cell_render h3 {\n",
       "        font-weight: 200;\n",
       "        font-size: 14pt;\n",
       "        line-height: 100%;\n",
       "        color:#d77c0c;\n",
       "        margin-bottom: 0.5em;\n",
       "        margin-top: 2em;\n",
       "        display: block;\n",
       "        white-space: wrap;\n",
       "        text-align: left;\n",
       "    }\n",
       "    h4 {\n",
       "        font-family: 'Open sans',verdana,arial,sans-serif;\n",
       "    }\n",
       "    .text_cell_render h4 {\n",
       "        font-weight: 100;\n",
       "        font-size: 14pt;\n",
       "        color:#d77c0c;\n",
       "        margin-bottom: 0.5em;\n",
       "        margin-top: 0.5em;\n",
       "        display: block;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    h5 {\n",
       "        font-family: 'Open sans',verdana,arial,sans-serif;\n",
       "    }\n",
       "    .text_cell_render h5 {\n",
       "        font-weight: 200;\n",
       "        font-style: normal;\n",
       "        color: #1d3b84;\n",
       "        font-size: 16pt;\n",
       "        margin-bottom: 0em;\n",
       "        margin-top: 0.5em;\n",
       "        display: block;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    div.text_cell_render{\n",
       "        font-family: 'Fira sans', verdana,arial,sans-serif;\n",
       "        line-height: 125%;\n",
       "        font-size: 115%;\n",
       "        text-align:justify;\n",
       "        text-justify:inter-word;\n",
       "    }\n",
       "    div.output_subarea.output_text.output_pyout {\n",
       "        overflow-x: auto;\n",
       "        overflow-y: scroll;\n",
       "        max-height: 50000px;\n",
       "    }\n",
       "    div.output_subarea.output_stream.output_stdout.output_text {\n",
       "        overflow-x: auto;\n",
       "        overflow-y: scroll;\n",
       "        max-height: 50000px;\n",
       "    }\n",
       "    div.output_wrapper{\n",
       "        margin-top:0.2em;\n",
       "        margin-bottom:0.2em;\n",
       "}\n",
       "\n",
       "    code{\n",
       "      font-size: 70%;\n",
       "    }\n",
       "    .rendered_html code{\n",
       "    background-color: transparent;\n",
       "    }\n",
       "    ul{\n",
       "        margin: 2em;\n",
       "    }\n",
       "    ul li{\n",
       "        padding-left: 0.5em; \n",
       "        margin-bottom: 0.5em; \n",
       "        margin-top: 0.5em; \n",
       "    }\n",
       "    ul li li{\n",
       "        padding-left: 0.2em; \n",
       "        margin-bottom: 0.2em; \n",
       "        margin-top: 0.2em; \n",
       "    }\n",
       "    ol{\n",
       "        margin: 2em;\n",
       "    }\n",
       "    ol li{\n",
       "        padding-left: 0.5em; \n",
       "        margin-bottom: 0.5em; \n",
       "        margin-top: 0.5em; \n",
       "    }\n",
       "    ul li{\n",
       "        padding-left: 0.5em; \n",
       "        margin-bottom: 0.5em; \n",
       "        margin-top: 0.2em; \n",
       "    }\n",
       "    a:link{\n",
       "       font-weight: bold;\n",
       "       color:#447adb;\n",
       "    }\n",
       "    a:visited{\n",
       "       font-weight: bold;\n",
       "       color: #1d3b84;\n",
       "    }\n",
       "    a:hover{\n",
       "       font-weight: bold;\n",
       "       color: #1d3b84;\n",
       "    }\n",
       "    a:focus{\n",
       "       font-weight: bold;\n",
       "       color:#447adb;\n",
       "    }\n",
       "    a:active{\n",
       "       font-weight: bold;\n",
       "       color:#447adb;\n",
       "    }\n",
       "    .rendered_html :link {\n",
       "       text-decoration: underline; \n",
       "    }\n",
       "    .rendered_html :hover {\n",
       "       text-decoration: none; \n",
       "    }\n",
       "    .rendered_html :visited {\n",
       "      text-decoration: none;\n",
       "    }\n",
       "    .rendered_html :focus {\n",
       "      text-decoration: none;\n",
       "    }\n",
       "    .rendered_html :active {\n",
       "      text-decoration: none;\n",
       "    }\n",
       "    .warning{\n",
       "        color: rgb( 240, 20, 20 )\n",
       "    } \n",
       "    hr {\n",
       "      color: #f3f3f3;\n",
       "      background-color: #f3f3f3;\n",
       "      height: 1px;\n",
       "    }\n",
       "    blockquote{\n",
       "      display:block;\n",
       "      background: #fcfcfc;\n",
       "      border-left: 5px solid #c76c0c;\n",
       "      font-family: 'Open sans',verdana,arial,sans-serif;\n",
       "      width:680px;\n",
       "      padding: 10px 10px 10px 10px;\n",
       "      text-align:justify;\n",
       "      text-justify:inter-word;\n",
       "      }\n",
       "      blockquote p {\n",
       "        margin-bottom: 0;\n",
       "        line-height: 125%;\n",
       "        font-size: 100%;\n",
       "      }\n",
       "</style>\n",
       "<script>\n",
       "    MathJax.Hub.Config({\n",
       "                        TeX: {\n",
       "                           extensions: [\"AMSmath.js\"]\n",
       "                           },\n",
       "                tex2jax: {\n",
       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
       "                },\n",
       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
       "                \"HTML-CSS\": {\n",
       "                    scale:100,\n",
       "                        availableFonts: [],\n",
       "                        preferredFont:null,\n",
       "                        webFont: \"TeX\",\n",
       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
       "                }\n",
       "        });\n",
       "</script>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import sys\n",
    "from __future__ import division, print_function\n",
    "import sys\n",
    "sys.path.insert(0,'../code')\n",
    "import book_format\n",
    "book_format.load_style('../code')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Computational Statistics\n",
    "\n",
    "## Distributions\n",
    "\n",
    "In statistics a <span>**distribution**</span> is a set of values and\n",
    "their corresponding probabilities.\n",
    "\n",
    "For example, if you roll a six-sided die, the set of possible values is\n",
    "the numbers 1 to 6, and the probability associated with each value is\n",
    "1/6.\n",
    "\n",
    "As another example, you might be interested in how many times each word\n",
    "appears in common English usage. You could build a distribution that\n",
    "includes each word and how many times it appears.\n",
    "\n",
    "To represent a distribution in Python, you could use a dictionary that\n",
    "maps from each value to its probability. I have written a class called\n",
    "<span>Pmf</span> that uses a Python dictionary in exactly that way, and\n",
    "provides a number of useful methods. I called the class Pmf in reference\n",
    "to a <span>**probability mass function**</span>, which is a way to\n",
    "represent a distribution mathematically.\n",
    "\n",
    "<span>Pmf</span> is defined in a Python module I wrote to accompany this\n",
    "book, <span>thinkbayes.py</span>. You can download it from\n",
    "<http://thinkbayes.com/thinkbayes.py>. For more information see\n",
    "Section [download].\n",
    "\n",
    "To use <span>Pmf</span> you can import it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from thinkbayes import Pmf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code builds a Pmf to represent the distribution of\n",
    "outcomes for a six-sided die:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 0.16666666666666666\n",
      "2 0.16666666666666666\n",
      "3 0.16666666666666666\n",
      "4 0.16666666666666666\n",
      "5 0.16666666666666666\n",
      "6 0.16666666666666666\n"
     ]
    }
   ],
   "source": [
    "pmf = Pmf()\n",
    "for x in [1,2,3,4,5,6]:\n",
    "    pmf.Set(x, 1/6.0)\n",
    "pmf.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Pmf` creates an empty Pmf with no values. The `Set` method sets the\n",
    "probability associated with each value to $1/6$.\n",
    "\n",
    "Here’s another example that counts the number of times each word appears\n",
    "in a sequence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bye 1\n",
      "football 1\n",
      "hi 2\n",
      "sky 1\n",
      "the 1\n"
     ]
    }
   ],
   "source": [
    "word_list = ['hi', 'the', 'bye', 'hi', 'football', 'sky']\n",
    "pmf = Pmf()\n",
    "for word in word_list:\n",
    "    pmf.Incr(word, 1)\n",
    "pmf.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Incr` increases the “probability” associated with each word by 1. If a\n",
    "word is not already in the Pmf, it is added.\n",
    "\n",
    "I put “probability” in quotes because in this example, the probabilities\n",
    "are not normalized; that is, they do not add up to 1. So they are not\n",
    "true probabilities.\n",
    "\n",
    "But in this example the word counts are proportional to the\n",
    "probabilities. So after we count all the words, we can compute\n",
    "probabilities by dividing through by the total number of words.\n",
    "<span>Pmf</span> provides a method, `Normalize`, that does exactly that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bye 0.16666666666666666\n",
      "football 0.16666666666666666\n",
      "hi 0.3333333333333333\n",
      "sky 0.16666666666666666\n",
      "the 0.16666666666666666\n"
     ]
    }
   ],
   "source": [
    "pmf.Normalize()\n",
    "pmf.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once you have a Pmf object, you can ask for the probability associated\n",
    "with any value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.16666666666666666\n"
     ]
    }
   ],
   "source": [
    "print(pmf.Prob('the'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And that would print the frequency of the word “the” as a fraction of\n",
    "the words in the list.\n",
    "\n",
    "Pmf uses a Python dictionary to store the values and their\n",
    "probabilities, so the values in the Pmf can be any hashable type. The\n",
    "probabilities can be any numerical type, but they are usually\n",
    "floating-point numbers (type `float`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The cookie problem\n",
    "\n",
    "In the context of Bayes’s theorem, it is natural to use a Pmf to map\n",
    "from each hypothesis to its probability. In the cookie problem, the\n",
    "hypotheses are $B_1$ and $B_2$. In Python, I represent them with\n",
    "strings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pmf = Pmf()\n",
    "pmf.Set('Bowl 1', 0.5)\n",
    "pmf.Set('Bowl 2', 0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This distribution, which contains the priors for each hypothesis, is\n",
    "called (wait for it) the <span>**prior distribution**</span>.\n",
    "\n",
    "To update the distribution based on new data (the vanilla cookie), we\n",
    "multiply each prior by the corresponding likelihood. The likelihood of\n",
    "drawing a vanilla cookie from Bowl 1 is 3/4. The likelihood for Bowl 2\n",
    "is 1/2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pmf.Mult('Bowl 1', 0.75)\n",
    "pmf.Mult('Bowl 2', 0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Mult` does what you would expect. It gets the probability for the given\n",
    "hypothesis and multiplies by the given likelihood.\n",
    "\n",
    "After this update, the distribution is no longer normalized, but because\n",
    "these hypotheses are mutually exclusive and collectively exhaustive, we\n",
    "can <span>**renormalize**</span>:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.625"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pmf.Normalize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is a distribution that contains the posterior probability for\n",
    "each hypothesis, which is called (wait now) the <span>**posterior\n",
    "distribution**</span>.\n",
    "\n",
    "Finally, we can get the posterior probability for Bowl 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.6000000000000001\n"
     ]
    }
   ],
   "source": [
    "print(pmf.Prob('Bowl 1'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the answer is 0.6. You can download this example from\n",
    "<http://thinkbayes.com/cookie.py>. For more information see\n",
    "Section [download]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Bayesian framework\n",
    "\n",
    "Before we go on to other problems, I want to rewrite the code from the\n",
    "previous section to make it more general. First I’ll define a class to\n",
    "encapsulate the code related to this problem:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class Cookie(Pmf):\n",
    "    \"\"\"A map from string bowl ID to probablity.\"\"\"\n",
    "\n",
    "    def __init__(self, hypos):\n",
    "        \"\"\"Initialize self.\n",
    "\n",
    "        hypos: sequence of string bowl IDs\n",
    "        \"\"\"\n",
    "        Pmf.__init__(self)\n",
    "        for hypo in hypos:\n",
    "            self.Set(hypo, 1)\n",
    "        self.Normalize()\n",
    "\n",
    "    def Update(self, data):\n",
    "        \"\"\"Updates the PMF with new data.\n",
    "\n",
    "        data: string cookie type\n",
    "        \"\"\"\n",
    "        for hypo in self.Values():\n",
    "            like = self.Likelihood(data, hypo)\n",
    "            self.Mult(hypo, like)\n",
    "        self.Normalize()\n",
    "\n",
    "    mixes = {\n",
    "        'Bowl 1':dict(vanilla=0.75, chocolate=0.25),\n",
    "        'Bowl 2':dict(vanilla=0.5, chocolate=0.5),\n",
    "        }\n",
    "\n",
    "    def Likelihood(self, data, hypo):\n",
    "        \"\"\"The likelihood of the data under the hypothesis.\n",
    "\n",
    "        data: string cookie type\n",
    "        hypo: string bowl ID\n",
    "        \"\"\"\n",
    "        mix = self.mixes[hypo]\n",
    "        like = mix[data]\n",
    "        return like"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A Cookie object is a Pmf that maps from hypotheses to their\n",
    "probabilities. The `__init__` method gives each hypothesis the same\n",
    "prior probability. As in the previous section, there are two hypotheses:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hypos = ['Bowl 1', 'Bowl 2']\n",
    "pmf = Cookie(hypos)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Cookie` provides an `Update` method that takes data as a parameter and\n",
    "updates the probabilities. `Update` loops through each hypothesis in the suite and multiplies its\n",
    "probability by the likelihood of the data under the hypothesis, which is\n",
    "computed by `Likelihood`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Likelihood` uses `mixes`, which is a dictionary that maps from the name\n",
    "of a bowl to the mix of cookies in the bowl.\n",
    "\n",
    "Here’s what the update looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pmf.Update('vanilla')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can print the posterior probability of each hypothesis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bowl 1 0.6000000000000001\n",
      "Bowl 2 0.4\n"
     ]
    }
   ],
   "source": [
    "for hypo, prob in pmf.Items():\n",
    "    print(hypo, prob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "which is the same as what we got before. This code is more complicated\n",
    "than what we saw in the previous section. One advantage is that it\n",
    "generalizes to the case where we draw more than one cookie from the same\n",
    "bowl (with replacement):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bowl 1 0.627906976744186\n",
      "Bowl 2 0.37209302325581395\n"
     ]
    }
   ],
   "source": [
    "dataset = ['vanilla', 'chocolate', 'vanilla']\n",
    "for data in dataset:\n",
    "    pmf.Update(data)\n",
    "pmf.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The other advantage is that it provides a framework for solving many\n",
    "similar problems. In the next section we’ll solve the Monty Hall problem\n",
    "computationally and then see what parts of the framework are the same.\n",
    "\n",
    "The code in this section is available from\n",
    "<http://thinkbayes.com/cookie2.py>. For more information see\n",
    "Section [download]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Monty Hall problem\n",
    "\n",
    "To solve the Monty Hall problem, I’ll define a new class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class Monty(Pmf):\n",
    "    \"\"\"Map from string location of car to probability\"\"\"\n",
    "\n",
    "    def __init__(self, hypos):\n",
    "        \"\"\"Initialize the distribution.\n",
    "\n",
    "        hypos: sequence of hypotheses\n",
    "        \"\"\"\n",
    "        Pmf.__init__(self)\n",
    "        for hypo in hypos:\n",
    "            self.Set(hypo, 1)\n",
    "        self.Normalize()\n",
    "\n",
    "    def Update(self, data):\n",
    "        \"\"\"Updates each hypothesis based on the data.\n",
    "\n",
    "        data: any representation of the data\n",
    "        \"\"\"\n",
    "        for hypo in self.Values():\n",
    "            like = self.Likelihood(data, hypo)\n",
    "            self.Mult(hypo, like)\n",
    "        self.Normalize()\n",
    "\n",
    "    def Likelihood(self, data, hypo):\n",
    "        \"\"\"Compute the likelihood of the data under the hypothesis.\n",
    "\n",
    "        hypo: string name of the door where the prize is\n",
    "        data: string name of the door Monty opened\n",
    "        \"\"\"\n",
    "        if hypo == data:\n",
    "            return 0\n",
    "        elif hypo == 'A':\n",
    "            return 0.5\n",
    "        else:\n",
    "            return 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far `Monty` and `Cookie` are nearly the same. And the code that\n",
    "creates the Pmf is the same, too, except for the names of the\n",
    "hypotheses:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "hypos = 'ABC'\n",
    "pmf = Monty(hypos)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling `Update` is pretty much the same:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = 'B'\n",
    "pmf.Update(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the implementation of `Update` is exactly the same. The only part that requires some work is `Likelihood`:\n",
    "\n",
    "```python\n",
    "def Likelihood(self, data, hypo):\n",
    "    if hypo == data:\n",
    "        return 0\n",
    "    elif hypo == 'A':\n",
    "        return 0.5\n",
    "    else:\n",
    "        return 1\n",
    "```\n",
    "Finally, printing the results is the same:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "B 0.0\n",
      "A 0.3333333333333333\n",
      "C 0.6666666666666666\n"
     ]
    }
   ],
   "source": [
    "for hypo, prob in pmf.Items():\n",
    "    print(hypo, prob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, writing `Likelihood` is a little complicated, but the\n",
    "framework of the Bayesian update is simple. The code in this section is\n",
    "available from <http://thinkbayes.com/monty.py>. For more information\n",
    "see Section [download]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Encapsulating the framework\n",
    "\n",
    "Now that we see what elements of the framework are the same, we can\n",
    "encapsulate them in an object—a `Suite` is a `Pmf` that provides\n",
    "`__init__`, `Update`, and `Print`:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "class Suite(Pmf):\n",
    "        \"\"\"Represents a suite of hypotheses and their probabilities.\"\"\"\n",
    "\n",
    "        def __init__(self, hypo=tuple()):\n",
    "            \"\"\"Initializes the distribution.\"\"\"\n",
    "\n",
    "        def Update(self, data):\n",
    "            \"\"\"Updates each hypothesis based on the data.\"\"\"\n",
    "\n",
    "        def Print(self):\n",
    "            \"\"\"Prints the hypotheses and their probabilities.\"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The implementation of `Suite` is in `thinkbayes.py`. To use `Suite`, you\n",
    "should write a class that inherits from it and provides `Likelihood`.\n",
    "For example, here is the solution to the Monty Hall problem rewritten to\n",
    "use `Suite`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from thinkbayes import Suite\n",
    "\n",
    "class Monty(Suite):\n",
    "\n",
    "    def Likelihood(self, data, hypo):\n",
    "        if hypo == data:\n",
    "            return 0\n",
    "        elif hypo == 'A':\n",
    "            return 0.5\n",
    "        else:\n",
    "            return 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here’s the code that uses this class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A 0.3333333333333333\n",
      "B 0.0\n",
      "C 0.6666666666666666\n"
     ]
    }
   ],
   "source": [
    "suite = Monty('ABC')\n",
    "suite.Update('B')\n",
    "suite.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can download this example from <http://thinkbayes.com/monty2.py>.\n",
    "For more information see Section [download]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The <span>M&M</span> problem\n",
    "\n",
    "We can use the `Suite` framework to solve the <span>M&M</span> problem.\n",
    "Writing the `Likelihood` function is tricky, but everything else is\n",
    "straightforward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class M_and_M(Suite):\n",
    "    \"\"\"Map from hypothesis (A or B) to probability.\"\"\"\n",
    "\n",
    "    mix94 = dict(brown=30,\n",
    "                 yellow=20,\n",
    "                 red=20,\n",
    "                 green=10,\n",
    "                 orange=10,\n",
    "                 tan=10)\n",
    "\n",
    "    mix96 = dict(blue=24,\n",
    "                 green=20,\n",
    "                 orange=16,\n",
    "                 yellow=14,\n",
    "                 red=13,\n",
    "                 brown=13)\n",
    "\n",
    "    hypoA = dict(bag1=mix94, bag2=mix96)\n",
    "    hypoB = dict(bag1=mix96, bag2=mix94)\n",
    "\n",
    "    hypotheses = dict(A=hypoA, B=hypoB)\n",
    "\n",
    "    def Likelihood(self, data, hypo):\n",
    "        \"\"\"Computes the likelihood of the data under the hypothesis.\n",
    "\n",
    "        hypo: string hypothesis (A or B)\n",
    "        data: tuple of string bag, string color\n",
    "        \"\"\"\n",
    "        bag, color = data\n",
    "        mix = self.hypotheses[hypo][bag]\n",
    "        like = mix[color]\n",
    "        return like"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First I need to encode the color mixes from before and after 1995:\n",
    "\n",
    "```python\n",
    "mix94 = dict(brown=30,\n",
    "             yellow=20,\n",
    "             ...\n",
    "\n",
    "mix96 = dict(blue=24,\n",
    "             green=20,\n",
    "             ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then I have to encode the hypotheses:\n",
    "\n",
    "```python\n",
    "hypoA = dict(bag1=mix94, bag2=mix96)\n",
    "hypoB = dict(bag1=mix96, bag2=mix94)\n",
    "```\n",
    "\n",
    "`hypoA` represents the hypothesis that Bag 1 is from 1994 and Bag 2 from 1996. `hypoB` is the other way around.\n",
    "\n",
    "Next I map from the name of the hypothesis to the representation:\n",
    "\n",
    "```python\n",
    "hypotheses = dict(A=hypoA, B=hypoB)\n",
    "```\n",
    "\n",
    "And finally I can write `Likelihood`. In this case the hypothesis,\n",
    "`hypo`, is a string, either `A` or `B`. The data is a tuple that\n",
    "specifies a bag and a color.\n",
    "\n",
    "        def Likelihood(self, data, hypo):\n",
    "            bag, color = data\n",
    "            mix = self.hypotheses[hypo][bag]\n",
    "            like = mix[color]\n",
    "            return like"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here’s the code that creates the suite and updates it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A 0.7407407407407407\n",
      "B 0.2592592592592592\n"
     ]
    }
   ],
   "source": [
    "suite = M_and_M('AB')\n",
    "\n",
    "suite.Update(('bag1', 'yellow'))\n",
    "suite.Update(('bag2', 'green'))\n",
    "\n",
    "suite.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The posterior probability of A is approximately $20/27$, which is what\n",
    "we got before.\n",
    "\n",
    "The code in this section is available from\n",
    "<http://thinkbayes.com/m_and_m.py>. For more information see\n",
    "Section [download]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion\n",
    "\n",
    "This chapter presents the Suite class, which encapsulates the Bayesian\n",
    "update framework.\n",
    "\n",
    "<span>Suite</span> is an <span>**abstract type**</span>, which means\n",
    "that it defines the interface a Suite is supposed to have, but does not\n",
    "provide a complete implementation. The <span>Suite</span> interface\n",
    "includes <span>Update</span> and <span>Likelihood</span>, but the\n",
    "<span>Suite</span> class only provides an implementation of\n",
    "<span>Update</span>, not <span>Likelihood</span>.\n",
    "\n",
    "A <span>**concrete type**</span> is a class that extends an abstract\n",
    "parent class and provides an implementation of the missing methods. For\n",
    "example, <span>Monty</span> extends <span>Suite</span>, so it inherits\n",
    "<span>Update</span> and provides <span>Likelihood</span>.\n",
    "\n",
    "If you are familiar with design patterns, you might recognize this as an\n",
    "example of the template method pattern. You can read about this pattern\n",
    "at <http://en.wikipedia.org/wiki/Template_method_pattern>.\n",
    "\n",
    "Most of the examples in the following chapters follow the same pattern;\n",
    "for each problem we define a new class that extends <span>Suite</span>,\n",
    "inherits <span>Update</span>, and provides <span>Likelihood</span>. In a\n",
    "few cases we override <span>Update</span>, usually to improve\n",
    "performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "In Section [framework] I said that the solution to the cookie problem\n",
    "generalizes to the case where we draw multiple cookies with replacement.\n",
    "\n",
    "But in the more likely scenario where we eat the cookies we draw, the\n",
    "likelihood of each draw depends on the previous draws.\n",
    "\n",
    "Modify the solution in this chapter to handle selection without\n",
    "replacement. Hint: add instance variables to <span>Cookie</span> to\n",
    "represent the hypothetical state of the bowls, and modify\n",
    "<span>Likelihood</span> accordingly. You might want to define a\n",
    "<span>Bowl</span> object."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
